Search CORE

22 research outputs found

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Author: Ernst Damien
Fonteneau Raphael
François-Lavet Vincent
Publication venue
Publication date: 01/12/2015
Field of study

Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Combined Reinforcement Learning via Abstract Representations

Author: Bengio Yoshua
François-Lavet Vincent
Pineau Joelle
Precup Doina
Publication venue
Publication date: 18/11/2018
Field of study

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this paper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.Comment: Accepted to the Thirty-Third AAAI Conference On Artificial Intelligence, 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Simple connectome inference from partial correlation statistics in calcium imaging

Author: Ernst Damien
François-Lavet Vincent
Geurts Pierre
Joly Arnaud
Louppe Gilles
Qiu Zixiao Aaron
Sutera Antonio
Publication venue
Publication date: 01/06/2014
Field of study

In this work, we propose a simple yet effective solution to the problem of connectome inference in calcium imaging data. The proposed algorithm consists of two steps. First, processing the raw signals to detect neural peak activities. Second, inferring the degree of association between neurons from partial correlation statistics. This paper summarises the methodology that led us to win the Connectomics Challenge, proposes a simplified version of our method, and finally compares our results with respect to other inference methods

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

A Machine with Short-Term, Episodic, and Semantic Memory Systems

Author: Cochez Michael
François-Lavet Vincent
Kim Taewoon
Neerincx Mark
Vossen Piek
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2023
Field of study

Inspired by the cognitive science theory of the explicit human memory systems, we have modeled an agent with short-term, episodic, and semantic memory systems, each of which is modeled with a knowledge graph. To evaluate this system and analyze the behavior of this agent, we designed and released our own reinforcement learning agent environment, "the Room", where an agent has to learn how to encode, store, and retrieve memories to maximize its return by answering questions. We show that our deep Q-learning based agent successfully learns whether a short-term memory should be forgotten, or rather be stored in the episodic or semantic memory systems. Our experiments indicate that an agent with human-like memory systems can outperform an agent without this memory structure in the environment

arXiv.org e-Print Archive

VU Research Portal

Association for the Advancement of Artificial Intelligence: AAAI Publications

Contributions to deep reinforcement learning and its applications in smartgrids

Author: François-Lavet Vincent
Publication venue: ULiège - Université de Liège
Publication date: 11/09/2017
Field of study

Reinforcement learning and its extension with deep learning have led to a field of research called deep reinforcement learning. Applications of that research have recently shown the possibility to solve complex decision-making tasks that were previously believed extremely difficult for a computer. Yet, deep reinforcement learning requires caution and understanding of its inner mechanisms in order to be applied successfully in the different settings. As an introduction, we provide a general overview of the field of deep reinforcement learning. The thesis is then divided in two parts. In the first part, we provide an analysis of reinforcement learning in the particular setting of a limited amount of data and in the general context of partial observability. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. An original theoretical contribution relies on expressing the quality of a state representation by bounding

L_1

error terms of the associated belief states. We also discuss and empirically illustrate the role of other parameters to optimize the bias-overfitting tradeoff: the function approximator (in particular deep learning) and the discount factor. In addition, we investigate the specific case of the discount factor in the deep reinforcement learning setting case where additional data can be gathered through learning. In the second part of this thesis, we focus on a smartgrids application that falls in the context of a partially observable problem and where a limited amount of data is available (as studied in the first part of the thesis). We consider the case of microgrids featuring photovoltaic panels (PV) associated with both long-term (hydrogen) and short-term (batteries) storage devices. We propose a novel formalization of the problem of building and operating microgrids interacting with their surrounding environment. In the deterministic assumption, we show how to optimally operate and size microgrids using linear programming techniques. We then show how to use deep reinforcement learning to solve the operation of microgrids under uncertainty where, at every time-step, the uncertainty comes from the lack of knowledge about future electricity consumption and weather dependent PV production

Open Repository and Bibliography - Liège

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Author: Ernst Damien
Fonteneau Raphaël
François-Lavet Vincent
Publication venue
Publication date: 01/12/2015
Field of study

peer reviewedUsing deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma

Open Repository and Bibliography - Liège

Using approximate dynamic programming for estimating the revenues of a hydrogen-based high-capacity storage device

Author: Ernst Damien
Fonteneau Raphaël
François-Lavet Vincent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2014
Field of study

peer reviewedThis paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the Dynamic Programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity

Crossref

Open Repository and Bibliography - Liège